# Loading packages
import PIL
import numpy as np
import matplotlib.pyplot as plt
from scipy.ndimage.filters import gaussian_filter1d
import random
import torch
import torchvision
import torch.autograd as autograd
import torchvision.transforms as T
For this project, you will use a pre-trained deep neural network, SqueezeNet, which is lightweight and runs fast on CPUs. Run the code below to load a pre-trained SqueezeNet from the PyTorch official model zoo.
# Test and set the device.
if torch.cuda.is_available():
device = 'cuda:0'
else:
device = 'cpu'
print('Use', device)
# Download and load the pretrained SqueezeNet model.
model = torchvision.models.squeezenet1_1(pretrained=True).to(device)
# Disable the gradient computation with respect to model parameters.
for param in model.parameters():
param.requires_grad = False
For Task#1 and Task#2, use the images in folder Project1\images where the filenames are the corresponding class labels. For example, 182.png is an image of Border Terrier, which is class 182 in ImageNet dataset. Please refer to this Gist snippet for a complete list. The images are from ImageNet validation set, and so the pre-trained model has never "seen" them.
For Task#3, you may use the images in folder Project1\style or any other images you like.
Most pre-trained models are trained on images that had been preprocessed by subtracting the per-color mean and dividing by the per-color standard deviation. Here are a few helper functions for performing and undoing this preprocessing.
IMAGENET_MEAN = np.array([0.485, 0.456, 0.406])
IMAGENET_STD = np.array([0.229, 0.224, 0.225])
def preprocess(img, size=(224, 224)):
transform = T.Compose([
T.Resize(size),
T.ToTensor(),
T.Normalize(mean=IMAGENET_MEAN.tolist(),
std=IMAGENET_STD.tolist()),
T.Lambda(lambda x: x[None]),
])
return transform(img)
def deprocess(img, should_rescale=True):
transform = T.Compose([
T.Lambda(lambda x: x[0]),
T.Normalize(mean=[0, 0, 0], std=(1.0 / IMAGENET_STD).tolist()),
T.Normalize(mean=(-IMAGENET_MEAN).tolist(), std=[1, 1, 1]),
T.Lambda(rescale) if should_rescale else T.Lambda(lambda x: x),
T.ToPILImage(),
])
return transform(img)
def rescale(x):
low, high = x.min(), x.max()
x_rescaled = (x - low) / (high - low)
return x_rescaled
def blur_image(X, sigma=1):
X_np = X.cpu().clone().numpy()
X_np = gaussian_filter1d(X_np, sigma, axis=2)
X_np = gaussian_filter1d(X_np, sigma, axis=3)
X.copy_(torch.Tensor(X_np).type_as(X))
return X
The concept of "image gradients" can be used to study the stability of a network. Consider a state-of-the-art deep neural network that generalizes well on an object recognition task. We expect such network to be robust to small perturbations to its input, because small perturbations cannot change the object category of an image. However, it was shown in the following paper[1] that by applying an imperceptible non-random perturbation to a test image, it is possible to arbitrarily change the network’s prediction.
[1] Szegedy et al, "Intriguing properties of neural networks", ICLR 2014
Given an image and a target class, we can perform gradient ascent over the image to maximize the target class, stopping when the network classifies the image as the target class. While the perturbations seem negligible to humans, the network would classify the perturbed images wrongly.
Read the paper, and then implement the following function make_adversarial_attack to generate "fooling images". For each image in Project1/images with class label $c$, generate a fooling image that will be classified into class $ c-1-d $ where $d$ is the last digit of your student number. Save each fooling image into the folder Project1/fooling_images with the filename {true_class}_{target_class}.png. You may confirm (optional) that the fooling image 182_9.png in the folder will be wrongly classified as ostrich (class 9 in ImageNet dataset).
For the image 182.png, show the difference map between the original image and the fooling image, and save it as 182_1x_diff.png. Magnify the difference by 10 times and save the resulting map as 182_10x_diff.png.
def make_adversarial_attack(X, target_y, model):
"""
Generate a fooling image that is close to X, but that the model classifies
as target_y.
Inputs:
- X: Input image; Tensor of shape (1, 3, 224, 224)
- target_y: An integer in the range [0, 1000)
- model: A pretrained CNN
Returns:
- X_fooling: An image that is close to X, but that is classifed as target_y
by the model.
"""
model.eval()
# Initialize our fooling image to the input image
X_fooling = X.clone().detach()
X_fooling.requires_grad = True
# you may change the learning rate and max_iter
learning_rate = 1
##############################################################################
# TODO: Generate a fooling image X_fooling that the model will classify as #
# the class target_y. You should perform gradient ascent on the score of the #
# target class, stopping when the model is fooled. #
# When computing an update step, first normalize the gradient: #
# dX = learning_rate * g / ||g||_2 #
##############################################################################
# your code
criterion = torch.nn.CrossEntropyLoss()
# print('grad before:',X_fooling.grad)
for i in range(100):
output = model(X_fooling)
(val,ind) = torch.sort(output,descending=True)
### Check if classified as target label
if(ind[0][0]==target_y):
score = output[0][target_y]
print('Target label',target_y, 'reached after', i, 'iterations','\n')
break
## Compute score
score = output[0][target_y]
score.backward()
### Perform gradient ascent
g = X_fooling.grad/torch.norm(X_fooling.grad)
with torch.no_grad():
X_fooling = X_fooling + learning_rate*g
X_fooling.requires_grad = True
##############################################################################
# END OF YOUR CODE #
##############################################################################
X_fooling = X_fooling.detach()
return X_fooling
##############################################################################
# TODO: 1. Compute the fooling images for the images under `Project1/images`.#
# 2. Show the 4 related images of the image '182.png': original image, #
# fooling image, 182_1x_diff.png and 182_10x_diff.png. #
##############################################################################
# your code
filenames = ['85','100','182','294','366','662']
for filename in filenames:
print('Fooling image', filename)
image_orig = PIL.Image.open('images/'+filename+'.png')
image_t = preprocess(image_orig).to(device)
### Generate fooling image
target_y = 2;
image_fool = make_adversarial_attack(image_t, target_y, model)
image_t = image_t.cpu()
image_fool = image_fool.cpu()
image_fool = deprocess(image_fool)
## Generate diff images
image_orig_arr = np.array(deprocess(image_t)).astype(float)
image_fool_arr = np.array(image_fool).astype(float)
image_diff_arr = image_fool_arr-image_orig_arr+127
image_diff_10_arr = 10*(image_fool_arr-image_orig_arr)+127
image_diff = PIL.Image.fromarray(image_diff_arr.astype(np.uint8))
image_diff_10 = PIL.Image.fromarray(image_diff_10_arr.astype(np.uint8))
### Save images
image_fool.save('fooling_images/'+filename+'_'+str(target_y)+'.png')
image_diff.save('fooling_images/'+filename+'_1x_diff.png')
image_diff_10.save('fooling_images/'+filename+'_10x_diff.png')
### Plot results
fig, axs = plt.subplots(1, 4,figsize=(15,5))
axs[0].imshow(image_orig)
axs[0].set_title('Original Image')
axs[1].imshow(image_fool)
axs[1].set_title('Fooled Image')
axs[2].imshow(image_diff)
axs[2].set_title('Image Diff x1')
axs[3].imshow(image_diff_10)
axs[3].set_title('Image Diff x10')
#### Optional Task to geenrate ostrich image
image_orig = PIL.Image.open('images/182.png')
image_t = preprocess(image_orig).to(device)
### Generate fooling image
print('Generating ostrich fooling image for 182')
target_y = 9;
image_fool = make_adversarial_attack(image_t, target_y, model)
### Check classification of image_fool
output = model(image_fool)
(val,ind) = torch.max(output, 1)
print('Fooling image classified as', 9)
# image_rgb = np.array(deprocess(image_t)).astype(float)
# image_fool_rgb = np.array(image_fool).astype(float)
# image_diff_rgb = image_fool_rgb-image_rgb+127
# image_diff_10_rgb = 10*(image_fool_rgb-image_rgb)+127
# fig, axs = plt.subplots(1, 4,figsize=(15,5))
# axs[0].imshow(image_rgb.astype(int))
# axs[0].set_title('Original Image')
# axs[1].imshow(image_fool_rgb.astype(int))
# axs[1].set_title('Fooled Image')
# axs[2].imshow(image_diff_rgb.astype(int))
# axs[2].set_title('Diff Image')
# axs[3].imshow(image_diff_10_rgb.astype(int))
# axs[3].set_title('Diff Image')
# print(image_diff_10.shape)
# print(image_fool.data)
# print(image_t.data)
# print(image_diff.data)
# print(deprocess(image_diff_10.data.clone().cpu()))
##############################################################################
# END OF YOUR CODE #
##############################################################################
By starting with a random noise image and performing gradient ascent on a target class, we can generate an image that the network will recognize as the target class. This idea was first presented in [2]; [3] extended this idea by suggesting several regularization techniques that can improve the quality of the generated image.
Concretely, let $I$ be an image and let $y$ be a target class. Let $s_y(I)$ be the score that a convolutional network assigns to the image $I$ for class $y$; note that these are raw unnormalized scores, not class probabilities. We wish to generate an image $I^*$ that achieves a high score for the class $y$ by solving the problem
$$ I^* = \arg\max_I (s_y(I) - R(I)) $$where $R$ is a (possibly implicit) regularizer (note the sign of $R(I)$ in the argmax: we want to minimize this regularization term). We can solve this optimization problem using gradient ascent, computing gradients with respect to the generated image. We will use (explicit) L2 regularization of the form
$$ R(I) = \lambda \|I\|_2^2 $$and implicit regularization as suggested by [3] by periodically blurring the generated image. We can solve this problem using gradient ascent on the generated image.
In the cell below, complete the implementation of the create_class_visualization function.
def jitter(X, ox, oy):
"""
Helper function to randomly jitter an image.
Inputs
- X: PyTorch Tensor of shape (N, C, H, W)
- ox, oy: Integers giving number of pixels to jitter along W and H axes
Returns: A new PyTorch Tensor of shape (N, C, H, W)
"""
if ox != 0:
left = X[:, :, :, :-ox]
right = X[:, :, :, -ox:]
X = torch.cat([right, left], dim=3)
if oy != 0:
top = X[:, :, :-oy]
bottom = X[:, :, -oy:]
X = torch.cat([bottom, top], dim=2)
return X
def create_class_visualization(target_y, model, device, **kwargs):
'''
Generate an image to maximize the score of target_y under a pretrained model.
Inputs:
- target_y: A list of two elements, where the first value is an integer in the range [0, 1000) giving the index of the
class, and the second value is the name of the class.
- model: A pretrained CNN that will be used to generate the image
- dtype: Torch datatype to use for computations
Keyword arguments:
- l2_reg: Strength of L2 regularization on the image
- learning_rate: How big of a step to take
- num_iterations: How many iterations to use
- blur_every: How often to blur the image as an implicit regularizer
- max_jitter: How much to gjitter the image as an implicit regularizer
- show_every: How often to show the intermediate result
'''
model.to(device)
l2_reg = kwargs.pop('l2_reg', 1e-3)
learning_rate = kwargs.pop('learning_rate', 25)
num_iterations = kwargs.pop('num_iterations', 100)
blur_every = kwargs.pop('blur_every', 10)
max_jitter = kwargs.pop('max_jitter', 16)
show_every = kwargs.pop('show_every', 25)
# Randomly initialize the image as a PyTorch Tensor, and make it requires gradient.
img = torch.randn(1, 3, 224, 224).mul_(1.0).to(device).requires_grad_()
for t in range(num_iterations):
# Randomly jitter the image a bit; this gives slightly nicer results
ox, oy = random.randint(0, max_jitter), random.randint(0, max_jitter)
img.data.copy_(jitter(img.data, ox, oy))
########################################################################
# TODO: Use the model to compute the gradient of the score for the #
# class target_y with respect to the pixels of the image, and make a #
# gradient step on the image using the learning rate. Don't forget the #
# L2 regularization term! #
# Be very careful about the signs of elements in your code. #
########################################################################
# your code
output_label = model(img)
## Compute score
score = output_label[0][target_y[0]]
score.backward()
g = img.grad/torch.norm(img.grad)
### Perform gradient ascent
with torch.no_grad():
img = img + learning_rate*g - l2_reg*img
img.requires_grad = True
########################################################################
# END OF YOUR CODE #
########################################################################
# Undo the random jitter
img.data.copy_(jitter(img.data, -ox, -oy))
# As regularizer, clamp and periodically blur the image
for c in range(3):
lo = float(-IMAGENET_MEAN[c] / IMAGENET_STD[c])
hi = float((1.0 - IMAGENET_MEAN[c]) / IMAGENET_STD[c])
img.data[:, c].clamp_(min=lo, max=hi)
if t % blur_every == 0:
blur_image(img.data, sigma=0.5)
# Periodically show the image
if t == 0 or (t + 1) % show_every == 0 or t == num_iterations - 1:
plt.imshow(deprocess(img.data.clone().cpu()))
class_name = target_y[1]
plt.title('%s\nIteration %d / %d' % (class_name, t + 1, num_iterations))
plt.gcf().set_size_inches(4, 4)
plt.axis('off')
plt.show()
return deprocess(img.data.cpu())
Once you have completed the implementation in the cell above, run the following cell to generate an image of a Tarantula:
target_y = [76, "Tarantula"]
# target_y = [366, "Gorilla"]
out = create_class_visualization(target_y, model, device)
Another task which is closely related to image gradients is style transfer which has become a "cool" application in deep learning for computer vision applications. You need to study and implement the style transfer technique presented in the following paper [4] where the general idea is to take two images (a content image and a style image), and produce a new image that reflects the content of one but the artistic "style" of the other.
Below is an example.

To perform style transfer, you will need to first formulate a special loss function that matches the content and style of each respective image in the feature space, and then perform gradient descent on the pixels of the image itself.
The loss function contains two parts: content loss and style loss. Read the paper [4] for details about the losses and implement them below.
def content_loss(content_weight, content_current, content_original):
"""
Compute the content loss for style transfer.
Inputs:
- content_weight: Scalar giving the weighting for the content loss.
- content_current: features of the current image; this is a PyTorch Tensor of shape
(1, C_l, H_l, W_l).
- content_target: features of the content image, Tensor with shape (1, C_l, H_l, W_l).
Returns:
- scalar content loss
"""
##############################################################################
# TODO: Implement content loss function #
# Note: It should not be very much code (less than 10 lines) #
##############################################################################
loss = torch.nn.MSELoss(reduction='sum')
loss_content= content_weight*loss(content_current,content_original)
return loss_content
##############################################################################
# END OF YOUR CODE #
##############################################################################
def gram_matrix(features):
"""
Compute the normalized Gram matrix from features.
The Gram matrix will be used to compute style loss.
Inputs:
- features: PyTorch Tensor of shape (N, C, H, W) giving features for
a batch of N images.
Returns:
- gram: PyTorch Tensor of shape (N, C, C) giving the
normalized Gram matrices for the N input images.
"""
##############################################################################
# TODO: Implement the normalized Gram matrix compuation function #
# Note: It should not be very much code (less than 10 lines) #
##############################################################################
a, b, c, d = features.size() # a=batch size(=1)
features = features.view(a * b, c * d).to(device) # resise
G = torch.mm(features, features.t()) # compute the gram product
return G.div(2*a * b * c * d)
##############################################################################
# END OF YOUR CODE #
##############################################################################
def style_loss(feats, style_layers, style_targets, style_weights):
"""
Computes the style loss at a set of layers.
Inputs:
- feats: list of the features at every layer of the current image.
- style_layers: List of layer indices into feats giving the layers to include in the
style loss.
- style_targets: List of the same length as style_layers, where style_targets[i] is
a PyTorch Variable giving the Gram matrix of the source style image computed at
layer style_layers[i].
- style_weights: List of the same length as style_layers, where style_weights[i]
is a scalar giving the weight for the style loss at layer style_layers[i].
Returns:
- style_loss: A PyTorch Tensor holding a scalar giving the style loss.
"""
##############################################################################
# TODO: Implement style loss function #
# Note: It should not be very much code (less than 10 lines) #
##############################################################################
# your code
loss = torch.nn.MSELoss(reduction='sum')
losses = []
temp_loss = []
for i,layer in enumerate(style_layers):
image_gram = gram_matrix(feats[layer])
targets_gram = gram_matrix(style_targets[i])
losses.append(loss(image_gram, targets_gram))
## To try to balance out losses in each layer
mean_loss = sum(losses).data/len(style_layers)
for i,l in enumerate(losses):
temp_loss.append((mean_loss/l.data)*l*style_weights[i])
total_loss = sum(temp_loss)/len(style_layers)
return total_loss
##############################################################################
# END OF YOUR CODE #
##############################################################################
With these loss functions, you can now build your style transfer model. Implement the function below to perform style transfer. To test the model, you can use the content and style images that we have provided in Project1/style, or improvise using any image you like. Please save your output images in the Project1/style folder.
Design and carry out some experiments (on your own!) to analyse how the choice of layers and the weights will influence the output image. Write down your observations and analysis in the Markdown cell provided below.
def style_transfer(content_image, style_image, content_layer, content_weight,
style_layers, style_weights, max_iter):
"""
Run style transfer!
You may first resize the image to a small size for fast computation.
Inputs:
- content_image: filename of content image
- style_image: filename of style image
- content_layer: an index indicating which layer to use for content loss
- content_weight: weighting on content loss
- style_layers: list of indices indicating which layers to use for style loss
- style_weights: list of weights to use for each layer in style_layers
- max_iter: max iterations of gradient updates
Returns:
- output_image: an image with content from the content_image and
style from the style image
"""
##############################################################################
# TODO: Implement the function for style transfer. #
##############################################################################
alpha = 1e-3
beta = 100
clamp_range = 2
style_image = preprocess(PIL.Image.open('style/'+ style_image +'.jpg'),size=(448,448)).to(device)
content_image = preprocess(PIL.Image.open('style/'+content_image + '.jpg'),size=(448,448)).to(device)
### Plot content and style images
plt.imshow(deprocess(content_image.cpu()))
plt.title('Content Image')
plt.show()
plt.imshow(deprocess(style_image.cpu()))
plt.title('Style Image')
plt.show()
### Extract features for content and style images
featureExtractor = FeatureExtractor(model.features)
content_features = featureExtractor(content_image)
style_features = featureExtractor(style_image)
style_targets = [style_features[i] for i in style_layers]
## Initialise random image
img = torch.randn(1, 3, 448, 448).mul_(1.0).to(device).requires_grad_()
# img = content_image.clone().to(device)
### First training stage
optimizer = torch.optim.Adam([img.requires_grad_()], lr=0.1)
for t in range(max_iter):
img.data.clamp_(-clamp_range,clamp_range)
optimizer.zero_grad()
img_features = featureExtractor(img)
loss_c = alpha*content_loss(content_weight, img_features[content_layer], content_features[content_layer])
loss_s = beta*style_loss(img_features, style_layers, style_targets, style_weights)
loss_total = loss_c+loss_s
loss_total.backward()
optimizer.step()
if t == 0 or (t + 1) % 500 == 0 or t == max_iter - 1:
print("step: %d, total_loss: %f, content loss: %f, style loss: %f"
%(t+1,loss_total,loss_c,loss_s))
plt.imshow(deprocess(img.data.clone().clamp_(-clamp_range,clamp_range).cpu(),should_rescale=True))
plt.gcf().set_size_inches(4, 4)
plt.axis('off')
plt.show()
print('Starting second stage')
optimizer = torch.optim.Adam([img.requires_grad_()], lr=0.005)
### Second step
for t in range(max_iter):
img.data.clamp_(-clamp_range,clamp_range)
optimizer.zero_grad()
img_features = featureExtractor(img)
loss_c = alpha*content_loss(content_weight, img_features[content_layer], content_features[content_layer])
loss_s = beta*style_loss(img_features, style_layers, style_targets, style_weights)
loss_total = loss_c+loss_s
loss_total.backward()
optimizer.step()
if t == 0 or (t + 1) % 500 == 0 or t == max_iter - 1:
print("step: %d, total_loss: %f, content loss: %f, style loss: %f"
%(t+1,loss_total,loss_c,loss_s))
plt.imshow(deprocess(img.data.clone().clamp_(-clamp_range,clamp_range).cpu(),should_rescale=True))
plt.gcf().set_size_inches(4, 4)
plt.axis('off')
plt.show()
print('Complete')
final_image = deprocess(img.data.clone().cpu())
final_image.save('style/Results/'+style_image_file+'+'+content_image_file+'.png')
return final_image
##############################################################################
# END OF YOUR CODE #
##############################################################################
# # TODO: 1. Choose one pair of images under 'Project1/style', and finish the #
# neural style transfer task by calling the style_transfer function.#
# 2. Show the 3 related images: content image, style image and the #
# generated style-transferred image. #
##############################################################################
### Feature extractor to get feature maps
class FeatureExtractor(torch.nn.Module):
def __init__(self, submodule):
super(FeatureExtractor, self).__init__()
self.submodule = submodule
def forward(self, x):
outputs = []
for name, module in self.submodule._modules.items():
x = module(x)
outputs.append(x)
return outputs
style_image_file = 'starry_night'
content_image_file = 'tubingen'
content_layer = 3
content_weight = 1
style_layers =[1,2,3,4,6,7,9,10,11,12]
style_weights = np.array([10,10,1,1,1,1,2,2,2,2])
style_weights = style_weights/np.sum(style_weights)
max_iter = 1000
output_img = style_transfer(content_image_file, style_image_file, content_layer, content_weight, style_layers, style_weights, max_iter)
plt.imshow(output_img)
plt.gcf().set_size_inches(6, 6)
plt.axis('off')
plt.title('Final Image')
plt.show()
# ##############################################################################
# END OF YOUR CODE #
##############################################################################
Write your observations and analysis in this Markdown cell:
The images were optimized in two stages using the Adam optimizer. The first stage has a higher learning rate of 0.1 to speed up the gradient descent and the second second stage has a learning rate of 0.005 to prevent overshooting. The results can be found in the 'style/Results' folder
From experimentation, good results were obtain with the following parameters
Some results are shown below with these parameters
$\alpha$ and $\beta$ determine how much emphasis to place on the content and style images respectively. The verify this, $\alpha$ was set to 1e-3 while $\beta$ was set to 20, 100 and 500 respectively. The results shown below demonstrate that as $\beta$ increases, the output iamge become les like the content image as expected
The parameters used were
$\beta$=10 (left), $\beta$=100 (middle), $\beta$=500 (right)
The squeezenet has 13 feature layers, with conv2d and ReLU for layers 0 and 1, max pool layers at layers 2, 5 and 8 and Fire layers for the rest. Fire layers are made of conv2d and ReLU layers. To study how the choice of style layer affects the output image, the number of style layers was gradually increased with the following parameters:
The results are shown below and it was observed that the lower layers (1-3) were responsible mainly actual pixel values, the middle layers (4-6) were responsible for textures such as brush strokes and the upper layers (7-12) were responsible for more high level features such as the circular swirls in the style image.
Style transfer using layers 1-3 (left), 1-7(middle), 1-12(right)
With the idea of what each layer is representing, the weights can then be chosen to emphasize specific features of the style image. This is shown by using the 'engineering' for the content image and 'starry_night' for the style image. By choosing specific weights, it is possible to choose between a 'night' scene or a 'day scene' for the output image. The following are the parameters:
By choosing setting lower weights for the lower style layers, we can preserve the colours of the content image and produce a day scene with the brush strokes of the style image. By choosing higher weights for the lower style layers, the output image will take the colour of the style image and procude a night scene.
Higher style weights for layer 9-12 (left), Higher style weights for layer 1-2 (right)
Similarly, effect of which content layer follows this understanding of what each layer represents. When a lower layer is used, the result more closely resembles the actual content image. This is beacuse lower layers more accurately represent actual pixel values while higher layers represent higher level features such as edges, textures and shapes. This is shown below for different content layers
The result is shown below for content layer = 1,3,5,7,9. As the layer number increases, more abstract features are kept instead of actual pixel values
Output image as the selected content layer increases
The neural style transfer demonstrates how neural networks can be used not just to perform calssification but to understand images as well. A trained neural network stores information about the features in the image, with higher layers containing higher level information.
After studying the various parameters, the following parameters have been found to work well